Randy Koliha
Isaiah Sarria
Owen Telis
John Levitt
=======
Steam is a popular, online video game distributor with many other complementing features. Steams hosts community discussion threads, access to download community-made modifications to games sold on the platform, social media-like user profiles and friends lists that displays the users overall or recent video game activity, and sales events which are based on holidays or unique themes.
We want to analyze a dataset regarding Steam games to dive deep into the behavior of gamers on the platform.
=======
Loading any needed packages.
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.6 ✓ dplyr 1.0.8
✓ tidyr 1.2.0 ✓ stringr 1.4.0
✓ readr 2.1.2 ✓ forcats 0.5.1
── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(scales)
Attaching package: ‘scales’
The following object is masked from ‘package:purrr’:
discard
The following object is masked from ‘package:readr’:
col_factor
library(directlabels)
The games.csv dataset has data collected for games on the popular game license selling platform, Steam, over the months from the years July 2012- Febuary 2021. The data set has records of 1258 games (also includes other miscellaneous pieces of software), and includes their titles, monthly player peaks, monthly average players at the same time, monthly gains/losses of players compared to the previous month, and the percentage of how closely the average players approach the peak.
# Storing dataset in `games`
games <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv")
#cleaning it up
games <- games %>%
mutate( number_of_month = match(games[1:83631,3], month.name) ) %>%
select(gamename,year,number_of_month,everything()) %>%
arrange(desc(year), number_of_month)
games
summary(games)
gamename year number_of_month month avg gain
Length:83631 Min. :2012 Min. : 1.000 Length:83631 Min. : 0.0 Min. :-250249.0
Class :character 1st Qu.:2016 1st Qu.: 3.000 Class :character 1st Qu.: 53.1 1st Qu.: -38.2
Mode :character Median :2018 Median : 7.000 Mode :character Median : 203.1 Median : -1.6
Mean :2017 Mean : 6.546 Mean : 2765.7 Mean : -10.3
3rd Qu.:2019 3rd Qu.:10.000 3rd Qu.: 764.0 3rd Qu.: 22.2
Max. :2021 Max. :12.000 Max. :1584886.8 Max. : 426446.1
NA's :1258
peak avg_peak_perc
Min. : 0 Length:83631
1st Qu.: 137 Class :character
Median : 500 Mode :character
Mean : 5470
3rd Qu.: 1727
Max. :3236027
# Finding the distinct number of `gamenames`
dist_g <- games %>%
distinct(gamename)
dist_g
# in alpha order
dist_g_alpha <- games %>%
distinct(gamename) %>%
arrange(gamename)
dist_g_alpha
g_payday <- games %>%
filter(gamename == "PAYDAY 2") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_payday
#makes the PAYDAY 2 dataframe
Years_Payday2 <- as.factor(g_payday$year)
ggplot(g_payday,aes(x = number_of_month,y = avg, group = year, color = Years_Payday2)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Payday 2: Average Players by Year")
#auto color pallet if + scale manual is removed
Why Payday 2 Spiked In 2017
g_pubg <- games %>%
filter(gamename == "PLAYERUNKNOWN'S BATTLEGROUNDS") %>%
filter(year %in% c(2017:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_pubg
Years_PUBG <- as.factor(g_pubg$year)
ggplot(g_pubg,aes(x = number_of_month,y = avg, group = year, color = Years_PUBG)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "PLAYERUNKNOWN'S BATTLEGROUNDS: Average Players by Year")
g_csgo <- games %>%
filter(gamename == "Counter-Strike: Global Offensive") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_csgo
Years_csgo <- as.factor(g_csgo$year)
ggplot(g_csgo,aes(x = number_of_month,y = avg, group = year, color = Years_csgo)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Counter Strike: Global Offensive: Average Players by Year")
g_gtav <- games %>%
filter(gamename == "Grand Theft Auto V") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_gtav
Years_gtav <- as.factor(g_gtav$year)
ggplot(g_gtav,aes(x = number_of_month,y = avg, group = year, color = Years_gtav)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Grand Theft Auto V: Average Players by Year")
g_dota2 <- games %>%
filter(gamename == "Dota 2") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_dota2
Years_dota2 <- as.factor(g_dota2$year)
ggplot(g_dota2,aes(x = number_of_month,y = avg, group = year, color = Years_dota2)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
scale_y_continuous(labels = comma) +
labs(title = "DOTA 2: Average Players by Year")
CSGO also was given out for free in 2020 so that shows simularites with the games given out for free over time.
This data shows that many multiplayer games seem to gain players over time and often have their peak average players occur in a period signficantly after the launch of the game.
g_fallout4 <- games %>%
filter(gamename == "Fallout 4") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_fallout4
Years_fallout4 <- as.factor(g_fallout4$year)
ggplot(g_fallout4,aes(x = number_of_month,y = avg, group = year, color = Years_fallout4)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Fallout 4: Average Players by Year")
g_farcry5 <- games %>%
filter(gamename == "Far Cry 5") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_farcry5
Years_farcry5 <- as.factor(g_farcry5$year)
ggplot(g_farcry5,aes(x = number_of_month,y = avg, group = year, color = Years_farcry5)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Farcry 5: Average Players by Year")
g_cyberpunk <- games %>%
filter(gamename == "Cyberpunk 2077") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_cyberpunk
Years_cyberpunk <- as.factor(g_cyberpunk$year)
ggplot(g_cyberpunk,aes(x = number_of_month,y = avg, group = year, color = Years_cyberpunk)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Cyberpunk 2077: Average Players by Year")
g_girl <- games %>%
filter(gamename == "Hentai Girl") %>%
filter(year %in% c(2012:2021)) %>%
mutate(label = if_else(number_of_month == max(number_of_month), as.character(year), NA_character_))
g_girl
Years_girl <- as.factor(g_girl$year)
ggplot(g_girl,aes(x = number_of_month,y = avg, group = year, color = Years_girl)) +
geom_line(size = 1) +
geom_point()+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1))+
geom_dl(aes(label = year), method = list(dl.trans(x = x + 0.1), "last.points", cex = 1)) +
scale_colour_manual(values=c("#964B00","#f58231","#030000", "#00FBFE", "#3cb44b", "#FF0004", "#4363d8", "#911eb4", "#f032e6", "#a9a9a9" )) +
xlab("Months")+
ylab("Average Players") +
labs(title = "Hentai Girl: Average Players by Year")
This data shows that singleplayer titles seem to have a drastic fall off in their player base following the launch of the game. This most likely occurs because once someone beats a single player game they are less likely to return to it. This can also be attributed to the fact that multiplayer/esports games are usually continually updated with new content, where as single player game typically are less likely to receive these updates consistently.
games_seasons <- games %>%
filter(year %in% c(2012:2021)) %>%
group_by(number_of_month) %>%
summarise(avg_month_sum = sum(avg))
ggplot(data = games_seasons) +
geom_line(aes(x = number_of_month, y = avg_month_sum),color = "blue", size = 1) +
geom_point(aes(x = number_of_month, y = avg_month_sum),color = "red", size = 3)+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1)) +
scale_y_continuous(labels = comma) +
xlab("Months")+
ylab("Average Number of Players") +
labs(title = "Steam Seasonality using Avg. Players")
games_seasons
games_seasons_pk <- games %>%
filter(year %in% c(2012:2021)) %>%
group_by(number_of_month) %>%
summarise(peak_month_sum = sum(peak))
ggplot(data = games_seasons_pk) +
geom_line(aes(x = number_of_month, y = peak_month_sum),color = "blue", size = 1) +
geom_point(aes(x = number_of_month, y = peak_month_sum),color = "red", size = 3)+
scale_x_continuous(breaks = seq(1, 12, by = 1),expand=c(0, 1)) +
scale_y_continuous(labels = comma) +
xlab("Months")+
ylab("Peak Number of Players") +
labs(title = "Steam Seasonality using Peak Players")
games_seasons_pk
NA
NA
The data suggests that there is a seasonality to Steam’s player data. This shows that Steam has the most players around December, Janurary, and Feburary. We suspect that this occurs because of the Christmas season and people recieving money and games as gifts. The steep fall off we see in Feburary may occur because individuals have completed the games they received during the holiday season. We also see an increase in players during the summer months. We suspect that this occurs because individuals are out of school for their summer break, giving them more time to play video games on Steam.
games_user_growth <- games %>%
filter(year %in% c(2012:2020)) %>%
group_by(year) %>%
summarise(avg_month_all_years = sum(avg))
ggplot(data = games_user_growth) +
geom_line(aes(x = year, y = avg_month_all_years),color = "blue", size = 1) +
geom_point(aes(x = year, y = avg_month_all_years),color = "red", size = 3) +
scale_x_continuous(breaks = seq(2012, 2020, by = 1)) +
scale_y_continuous(labels = comma) +
xlab("Years")+
ylab("Average Number of Players") +
labs(title = "Steam Platform Growth using Avg. Players")
games_user_growth
games_user_growth_pk <- games %>%
filter(year %in% c(2012:2020)) %>%
group_by(year) %>%
summarise(peak_month_all_years = sum(peak))
ggplot(data = games_user_growth_pk) +
geom_line(aes(x = year, y = peak_month_all_years),color = "blue", size = 1) +
geom_point(aes(x = year, y = peak_month_all_years),color = "red", size = 3) +
scale_x_continuous(breaks = seq(2012, 2020, by = 1)) +
scale_y_continuous(labels = comma) +
xlab("Years")+
ylab("Peak Number of Players") +
labs(title = "Steam Platform Growth using Peak Players")
games_user_growth_pk
NA
NA
The graph shows that the Steam platform has had relatively consistent growth from 2012 to 2020. However, we can see the only time Steam has a drop in users was from 2018 to 2019. We suspect this to be related to the rise of other online games market places such as the EPIC Games store and the popularity of Fortnite at the time. In November of 2018, Fortnite hit a 8.3 million concurrent players. We suspect that the popularity of Fortnite combined with the fact it was not on Steam is ultimately what caused the dip in players from 2018-2019.
g_without_2021 <- games %>%
filter(number_of_month == 12, year %in% 2012:2020)
g_added <- g_without_2021 %>%
ggplot()+
geom_bar(aes(x = year))+
scale_x_continuous(breaks = seq(2012, 2020, by = 1)) +
xlab("Years")+
ylab("Number of Games") +
labs(title = "Number of Games Published on Steam")
g_added
g_added_count <- g_without_2021 %>%
count(year)
grow_diff_titles <- c(268,268,393,549,720,911,1034,1101,1165)
g_added_count$differ <- grow_diff_titles
g_added_count <- g_added_count %>%
mutate(subtracted = n - differ)
g_added_count %>%
ggplot() +
geom_line(aes(x = year, y = subtracted, group = 1), size = 1, color = "red")+
geom_point(aes(x = year, y = subtracted, group = 1), size = 3, color = "red")+
scale_x_continuous(breaks = seq(2012, 2020, by = 1)) +
xlab("Years")+
ylab("Numbers of Games") +
labs(title = "Growth of Games Published to Steam by Year")
g_added_count
NA
When analyzing this data we are looking at what we consider to be relevant titles published on the Steam platform. Anyone can publish their game on Steam and as of 2020 their were nearly 50,000 titles published on the platform. The data utilized in the data set is pulled from a data set that focuses on games that are actually played. When looking at the graphs we can see that the game library on Steam has been increasing over the years. However, we can also see that there was a large dip in the number of relevant games published to the platform between 2016 and 2018.
games %>%
filter(avg == max(avg))
PLAYERUNKNOWN’S BATTLEGROUNDS is the game with the best avg player performance.
games %>%
filter(peak == max(peak))
The game with the highest peak players in Steam history is PLAYERUNKNOWN’S BATTLEGROUNDS with 3,236,027 players.
games %>%
drop_na(gain) %>%
filter(gain == max(gain))
NA
Playersunknown Battlegrounds has had the highest peak and avg players as well as greatest gain from the prior month in all of steam history.
games %>%
drop_na(gain) %>%
filter(gain == min(gain))
It makes sense that Cyberpunk 2077 had the biggest loss of players from one month to the next because the game was filled with bugs that made the game had to enjoy so many poeple returned the game.
Our data suggest that multiplayer games can hit their peak after release after a new content update is released. This is not always the case though. Single player games follow the trend of peaking on release due to players finishing the game. We were able to use our data to see the seasonality of Steam Games and conclude that the months December, January and February are the most actively played months. Lastly, our analysis concludes that steam is a steady consistently growing platform that is used by millions daily.